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Abstract 



The present paper studies continuity of generalized entropy func- 
tions and relative entropies defined using the notion of a deformed 
logarithmic function. In particular, two distinct definitions of relative 
entropy are discussed. As an application, all considered entropies are 
shown to satisfy Lesche's stability condition. The entropies of Tsallis' 
nonextensive thermostatistics are taken as examples. 
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1 Introduction 



The discrete entropy functional 




(1) 



k 



is not continuous in the total variation norm 



\\p-<l\\i 



(2) 



k 



in case the number of microstates k is infinite. This means that a small 
change in probability distribution may cause an arbitrary large change in 
entropy. This discontinuity has been identified recently Q as an essential 



characteristic of information content in natural languages. But its occur- 
rence can make it difficult to obtain a reliable estimate of entropy from ex- 
perimental observation. In many cases the probabilities pk are defined over 
a finite index set k = 1,2, ■ ■ ■ , N. Then uniform continuity holds and a use- 
ful estimate, called Lesche's stability condition [2], exists — see expression 
()55|). The inequality was already known before since Fannes |3j proved the 
quantum version of the inequality about ten years earlier. However, Lesche 
formulated the inequality as a condition which is satisfied by but not by 
the alpha-entropies of Renyi |3]. Recently jS], it has been shown that also 
the g-entropies of Tsallis' nonextensive thermostatistics satisfy Lesche's con- 
dition. Here, we generalize this proof to a large class of entropy functions, 
and formulate a more general continuity estimate 

It is known since long that in the natural logarithm may be replaced by 
an arbitrary increasing function f{x). The entropy of the discrete probability 
distribution function (pdf) p reads then 

hp) = -T.PkfiPk)- (3) 

k 

In the terminology of jHl 13 EI these are quasi-entropies. It is clear that for 
general functions /(x) not much can be said about continuity of entropy 
or relative entropy. It is obvious to require that f{x) shares some of the 
properties of the natural logarithm. A class of functions satisfying such extra 
conditions has been introduced recently [Hj. They have been used as the basis 
for a broad generalization of thermostatistics fOl E]- The present paper 
focuses on entropy functionals occurring in this generalized thermostatistics. 

A possible generalization of relative entropy, also called divergence J2] , 
is /-divergence jTHl Clj , defined by 

HpM = J2(ikfiPk/qk), (4) 

k 

with f{x) a convex function, defined for x > 0, strictly convex at x = 1. 
The ratio pk/qk can be seen as the discrete Radon-Nikodym derivative of p 
w.r.t. q. The latter has been the basis for a systematic generalization to the 
context of quantum mechanics — see chapter 5 of [7j . Alternative expressions 
of the form 

D{p\\q) = E IfiPk) - fiQk) - {Pk - qk)f'{qk)l (5) 

k 

with f'{x) the derivative of f{x), are called divergences of the Bregman type 
in the mathematics literature. In the original definition ^H] the pdfs p and 
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q are interchanged. Then and (0) are identical in case f{x) = xlog{x). 
Hence, in the standard theory there is no need to make a difference be- 
tween the two forms. To clarify why both are needed let us remark that 
mean entropy, in contrast with dynamical entropy, is negative relative en- 
tropy w.r.t. some reference state. If the number N of microstates is finite 
then entropy is relative entropy w.r.t. uniform probabilities qk = 1/N 

-I{p\\Q) = -j^T.fiNpk). (6) 

k 

The continuum limit of (jH)) becomes 

-I{p\\q)^- f'dkfipik)) (7) 
Jo 

for any probability measure p with density function p{x) w.r.t. the Lebesgue 
measure dx of [0, 1]. This continuum limit makes clear why a definition of 
relative entropy of the form (j^ is needed. In what follows, the definition of 
generalized entropy that will be used is 

iip) = -j:fip^)- (8) 

k 

By omitting the factors from ^ the explicit dependence on the number 
of microstates disappears and the expression is of the form Q. In particular, 
if /(x) = xlog(x) then I{p) coincides with Io{p). 

There exist also situations where a divergence of the form (jSJ is needed. 
In (generalized) statistical mechanics relative entropy D{p\\q) measures the 
difference in free energy between an arbitrary pdf p and the equilibrium pdf 
q. The quantity —f\qk) equals the energy of the fc-the microstate divided 
by temperature (up to a constant term). Hence, 

-Y^puinMk) - m (9) 

k 

is the (non-equilibrium) free energy of p divided by temperature T (again up 
to a constant term). Then expresses that free energy as a function of the 
pdf p is minimal at equilibrium p = q. 

In information theory the linking identity connects average code length, 
entropy and divergence 

{K,p)=I{p)+D{p\\q). (10) 

See e.g. [Tj. Here, divergence measures the redundancy of the code k, against 
the pdf p. From (fTIH) follows 

- {k, q) = I{p) + D{p\ \q) - I{q), (11) 
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which can be identified with (jSj), provided that the average code length is 
given by 

{'^,p) = -Y.PkfiQk) + C, (12) 

k 

with C a suitably chosen constant. 

The paper is organized as follows. The next section gives a short review of 
deformed exponentials and logarithms. Sections 3, 4, and 5, discuss the defi- 
nitions of entropy and relative entropy. Continuity estimates for entropy and 
relative entropy are given in section 6. Finally, Lesche's stability condition is 
discussed in sections 7 and 8. The paper is concluded with a short discussion 
of results, followed by appendices, containing proofs of inequalities. 



2 Deformed exponentials and logarithms 

In P, a deformed logarithm is defined as a strictly increasing concave func- 
tion, defined for all x > 0, vanishing for a; = 0. Following ^ij it is written 
as 

ln,(x) = /^d,J-^ (13) 

with (j){y) a strictly positive increasing function. For convenience, the integral 
of ln^(a;) is denoted 

F^{x) = Ay In^(y) = dy^. (14) 

The possible divergence of \n.^{x) at x = should be mild enough so that 
^0(0) is finite. The inverse function is the deformed exponential exp0(x) and 
is defined on the range of \ii^{x), which may be less than the whole real line. 
If needed, the domain of definition is extended by putting exp0(x) = if x 
is too small, and exp0(x) = -|-cxd if x is too large. 

For further use the notion of deduced logarithmic function uj^{x), associ- 
ated with ln0(x), is needed. It is defined by 

uj^{x) = (x - 1)^^(0) - xF<^(l/x) 



= X j^'^ dy {-\nM-FM) 
Jo 




It is again a deformed logarithm provided that 

/ dx ln0(l/x) < +00. (16) 
^0 
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The name of K-deformed logarithm is used in j|9j and, with a more re- 
stricted meaning, in ^Hl- To avoid confusion this name is used in the present 
paper only with the latter restricted meaning. Its origin is the kappa-distribu- 
tion, which is a generalization of the Maxwell distribution. This distribution 
is given by 



p{v) = A 



1 1 y 

2k, Vq 



-l-K 



(17) 



and can be written as p{v) = Aexp^{—{l/2)Pv'^/vQ) with the deformed log- 
arithm ln0(a;) defined by 



In^fx) = k(1 



X 



-l/{l+K)^ 



1 + kJi 



dyy 



-(2+k)/(1+k) 



K>0. 



As a simple example of deformed exponential and logarithmic functions, 
consider the piecewise linear functions determined by the values 



exp<^( 



n] 



n e Z, 



(19) 



with a > any base number. But also the function ln^(a;) = — 1 + ^/x is 
a deformed logarithm. Its inverse is given by exp0(a;) = if x < — 1, and 
exp^(a;) = (1 + a;)^ otherwise. 



3 Entropy 

The entropy of a discrete pdf p is defined by means of the deduced 

logarithmic function uj(fi{x), rather than by the deformed logarithm In^(x). 
The reason for doing so is that the derivative of uj^{x) exists and can be 
calculated in terms of \n^{x) while not much is known in general about the 
derivative l/0(x) of the function \\i^{x). The definition of entropy functional 
reads 

hip) = ^Pki^4>{^IPk) < +00. (20) 
k 

Note that the function xuj(f,{l/x) is non-negative and goes to zero in the limit 
a; = 0. Hence the expression is well-defined. Basic properties are /</,(p) > 
and 

+ (1 - \)q) > \I^{p) + (1 - A)/^(g), < A < 1, (21) 

i.e. entropy hip) is a concave function of the pdf p. 

From the definition of the deduced logarithmic function uj^{x) follows 
that 

hip) = T.[i^-Pk)Fm-FAPk)] 

k 
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rPk 

= -F^{0)-J2 / da;ln^(x). 
k •'^ 



(22) 



In particular, I^ij)) is of the form (jHl) with 

f{x)=F^{x)-{l-x)F^{Q). (23) 

Let us discuss some examples. If \n.^{x) is replaced by the natural loga- 
rithm log(x) then the entropy is denoted Io{p) and is given by the well-known 
expression (^. As a further example, consider entropy in the context of Tsal- 
lis' non-extensive thermodynamics |jl2ij. Fix a number k between -1 and 1, 
not equal to 0. A deformed logarithm is defined by 

ln^(x) = (l + «:-^)(x''-l)= fdy^. (24) 

Ji x^ 

Note that this definition differs from the definition of g-logarithm found in 
the Tsallis hterature ^18j, which coincides with the deduced logarithm 

u^ix) = {1/k){1-x-^). (25) 

A short calculation yields the entropy functional 

This entropy functional was studied long ago by Havrda and Charvat [12] 
and by Daroczy [201 • It is a monotonic function of Renyi's alpha-entropies 
ji] . It is the starting point of Tsallis' thermostatistics. In the latter context 
it is common to use the parameter q = 1 + k, instead of k. In the present 
paper the symbols p, q, and r are used for pdfs. 

As a final example, consider the K-deformed logarithm introduced by 
Kaniadakis [HI IH] 

\n,(x) = ^{x^-x-''). (27) 

The parameter k should satisfy — 1 < /t < 1 to guarantee concavity of the 
deformed logarithm. The inverse function reads 

exp^(x) = (^Kx + Vl + K^x^) ^ . (28) 

The corresponding entropy functional is obtained directly from ()22|). The 
result is 
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4 Relative entropy 



Let g be a pdf for which > holds for all k (this condition can be omitted 
if the deformed logarithm is such that a;<^(0) is finite). From (0} follows that 
the relative entropy of the pdf p, given q, is defined by 

h{p\\(l) = -J2Pk^<pi(lk/Pk)- (30) 

k 

Note that, using the definition of tu^, one obtains 

I^pWq) =J2 \n^{x/qk). (31) 

J Oh 



Expression (j30|) is of the form with f{x) given by (j23|) . In particular, 
this means that the divergence /<^(p||g), considered here, is a special case of 
the /-divergence of fH^ I14j. with functions / which are strictly convex and 
have a concave derivative. Many properties of /-divergence are known — see 
|221- In particular, one has I^{p\\q) > and I^{p\\q) = implies p = q. Also, 
I(l){p\\q) is jointly convex in p and q. 

For the example of Tsallis' entropy functional one obtains, using (jSSl), 



UipM = -T.p>^ -] -1 • (32) 




This expression has been introduced in the context of Tsallis' thermostatistics 
independently by several authors j2Sl EH ESj • However, the definition was 
known before in the context of Renyi's alpha-entropies — see jSE] ■ 

If ln^{x) has a unique derivative ln^(a;) = l/(f){x) in the point x = 
1 and the probabilities pk depend on parameters ^* then the generalized 
Fisher information metric j2ZI, defined by /^(p + dp\\p) = J^(p||p + dp) = 
{l/2)gij{p)de'de^, becomes 

9ijip) = Ki^)2^Pk—g^, ^7—- (33) 

k 

Note that this expression does not depend on the actual choice of deformed 
logarithm, except through the prefactor In'^(l). 



5 Alternative definition of divergence 

So far, definition (jHn|) seems quite satisfactory. However, as discussed in the 
introduction, there is a need for an alternative definition of the form By 
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modification of ()31|) one obtains 

D^ip\\q) = J2 dx (ln<^(a;) - In^(gfe)) 
k ■'Ik 

= WAVk) - F^{qk) - {pk - Qk) ln0(gfc)] 

k 

= iM) - hip) - J2iPk - (Ik) W(gfc)- (34) 

k 

Tflis expression is of the form with f{x) given by p3|l . Positivity of 
D^{p\\q) follows immediately because \n.^{x) is an increasing function of x. 
Equality D^{p\\q) = implies that p = q. Convexity in the first argument is 
straightforward. For the example of Tsallis' entropy one obtains 

DMq) = -T.Pk{pl - ql) - Y.iPk - <lk)qk, (35) 
k k 



which is definitely different from 

If the probabilities pk depend on parameters 6^ then the generalized Fisher 
information metric becomes 

9ijip) = E K(Pfc)^^- (36) 

Indeed, one has 

rPk+dp 



rPk+apk 

D^{p + dp\\p) = 2^ dx {\n^{x) - \n^{pk)) 
k •'P" 

= 2^/ dx [\ia^{pk){x - Pk) + 

1^ J Oh ^ 



k •'Pfe 

= l^ln;(p,)(dp,)^ + ..., (37) 

k 

and similarly for D^{p\\p + dp). In contrast with ()33|) the metric tensor (j^Hj) 
depends in a non-trivial way on the deformed logarithm In^. 



6 Continuity estimates of entropy and of rel- 
ative entropy 

In Appendix A is proved that 

r\pk—gk\ 

\hiP)~Ui<l)\ < ^E / ln<^(a;) da; 



= Y.[F^{0)-F^{\p,-q,\)] 

k 

= d{p,q)< +00. (38) 

The r.h.s. of ()38|1 defines a metric d{p, q). In particular, it satisfies the triangle 
inequality. Note that the distance between two pdfs may be infinite. This 
is not a problem since one can always define a new metric by dM{x,y) = 
min{d{x,y), M}, with M a fixed positive constant. The two metrics d and 
dM define the same topology. 

If ln^[x) is taken to be the natural logarithm ln(x) then ()38|1 becomes 

\Io{p) - Io{q)\ < \\P - q\\i -Y.\Pk - (lk\M\Pk - QkD- (39) 

k 

More generally, take In^ equal to the logarithm (j211), used in the Tsallis 
context. Then becomes 

Mp) - I^iq)\ < (1 + K-')\\p - gill - \Pk - Qk\'^^- (40) 

k 

Differences in relative entropy can be estimated in a way similar as for 
entropy differences. One finds (see Appendix A) 

m{p\\r) - I^{q\\r)\ < d{p, q) + hr{p, q) 
\D^ip\\r) - D^iq\\r)\ < d(p, g) + e,(p, g) (41) 

with d{p, q) as before, and with 

hr{p,q) = ^ |pfc - g/cl ln0(l/rfc), 

k 

er(p,g) = -J2\Pk- Qkl'^^cpir-k) (42) 

k 

The r.h.s. of ()4H1 is the sum of two distances, each satisfying the trian- 
gle inequality. Take g = r in (jlTI to obtain an upper bound for I^{p\\q), 
resp. D^{p\\q). 

7 A general continuity condition 

The r.h.s. of ()38|1 resembles the entropy of a distribution with elements \pk — 
qk\- Introduce therefore the symmetric difference pAq of two distinct pdfs p 
and g by 

(pAg), = (43) 
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Note that pAg is again a pdf. Its elements satisfy {pAq)^ < 1/2. This imphes 
that 

I^ipAq) > -F^(0) - W(l/2). (44) 
In Appendix B is shown that from (pH|l follows that if ||p — < 1 then 

< [ir^(o) + /^(pAg)]. (45) 

If ||p — (jIIi = 1, then this inequality coincides with (jHHjl . 

Take In^ equal to the logarithm ()24j) . used in the Tsallis context. Then 
(PSj) becomes 

|/<^(p)-/^(g)| < l[(l + n)\\p-q\\,-\\p-q\\l+-][l + I^ipAq)]. 

(46) 

This is less sharp than ()4()|1 which can be written as 



Mp)-iM\ < -(i + «:)lb-g||i + lb-g||}+''k(pAg)-- 

K IK, 



(47) 



In combination with ()44|) . ()45|) shows that the entropy functional /^(p) sat- 
isfies the following condition. 

Condition 1 For each e > there exists 6 > such that 

|/(p)-/(g)| < eI{pAq) (48) 

holds for all pdfs p and q satisfying p q and \ \p — q\\i < S . 

To show relevance of this condition one consequence is highlighted. Note 
that {\p + (1 — X)q)Aq does not depend on A in the range < A < 1. Hence 
Condition 1 implies that for each e > there exists 6 > such that 

\I{Xp + (1 - X)q) - lifip + (1 - fi)q)\ < eI{pAq) (49) 

holds for distinct pairs p and q, and for all A and fi between and 1, satisfying 
|A — ;u| Hp — < 6. This result implies uniform continuity of entropy on 
the segment (p, g), provided I{pAq) is finite. 
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8 Lesche's stability condition 



Assume now that the number of microstates is finite, equal to (i.e., the 
index k of the pdfs p and q runs from 1 to A^). Introduce the notation 

jmax^^^) = max{/(p) : pk = OfoT k> N}. (50) 
Lesche j2] showed twenty years ago that Io{p) satisfies the following condition. 



Condition 2 For each e > there exists 6 > such that 

\I{p)-I{q)\ < el^^^iN) (51) 
holds for all pdfs p and q satisfying \ \p — q\\i < S and Pk = qk = for k > N. 

It is clear that an entropy function I{p) satisfying Condition 1 also satisfies 
Condition 2. For fixed A^ these conditions imply uniform continuity, which 
is a rather trivial statement because a continuous function on a compact set 
is automatically uniformly continuous. In addition, (jKT|) specifies how the 
estimate depends on the number of nonzero components A^. 

In the remainder of this section some inequalities, used in the literature 
to prove Lesche's condition, are shown to follow from (jHHjl . In Appendix C 
is shown that (jHHjl implies that 



Mp)-I^{q)\ < NFM-NF4N~'\\p-q\U) 

fWp-lWi/N 

= —N / dxhi^{x) 

= \\p-1i\U[F^{0)+uJ^{N/\\p-qM. (52) 

It is difficult to bound uj^{N/\\p — q\\i) by I^^'^{N) = uj^{N) in the general 
case using only that ^^(a;) is a concave increasing function. However, in the 
case that the deformed logarithm is given by ()24|). then one has 



MN/\\p-q\\i) = -il-\\p-q\ri) + \\p-q\riMN). (53) 
This can be used to write in the following form 

\U{p)-U{<i)\ < + 

+ + j-x^iV)] (54) 

This is the result obtained recently by Abe |5j. It implies that I^{p) satisfies 
Condition 2. In the limit k = ()54|) becomes 

\io{p)-h{q)\ < {i + ir''{Nmp-q\\i-\\p-q\\iH\\p-q\\i)-m 

This is the expression obtained originally by Lesche j2]. Fannes [HlEHl showed 
that, if Hp — < 1/3, then one has the slightly stronger inequality 

|/o(p)-/o(g)l < ir''m\p-Q\\i-\\p-q\\iH\\p-Q\\i)- (56) 
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9 Discussion 



The present paper considers a large class of entropy functionals. Their defi- 
nition is based on the concept of deformed logarithms. These entropies have 
nice enough properties to enable the proof of useful estimates. Only discrete 
pdfs have been considered. Expressions for continuous distributions and for 
quantum probabilities are found in [21]. 

For each entropy functional I,j){p) there exists a metric d{p, q) majorizing 
the difference \I(j){p) — I^{q)\ — see inequality (jHE))- The difference of relative 
entropies |J<^(p||r) — /0(g||r)| is majorized with the sum of two distances, the 
distance d{p,q) mentioned above, and a distance hr{p,q) which depends on 
the pdf r — see PTjH^. 

An alternative definition of relative entropy has been proposed. 

It satisfies similar properties as /<^(p||g), but serves other goals. It is used in 
generalized statistical physics to measure changes in free energy. In informa- 
tion theory it is a measure of redundancy. 

Although the proof of (jnH|) is rather elementary, the result can be used 
to show that all entropy functionals, considered in the present paper, satisfy 
Lesche's stability condition (Condition 2 of the paper), as well as a stronger 
version of the inequality (Condition 1 of the paper). The proof is shorter and 
more transparent than that of 
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Appendix A 

Here we prove the inequalities and (PT|) . Consider 



If pk < qk then the contribution is negative and may be omitted when trying 
to obtain an upperbound. Hence one gets immediately, using Heavisides 
function 9{x), 




k 



(Al) 




k 
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Pk-Qk] 



< -Y. In^(x). (A2) 

k •'^ 



This proves 

To prove (j4T| note that from (j3T|l follows 



I^{p\\r) - I^{q\\r) = 2^ dx \n^{x/rk). (A3) 

7, J Oh 



k -"tk 

Assume pk < Qk and write the fc-th term as 



/ dx \n^{x/rk). (A4) 
•'Pk 



It increases when —\n.^{x/rk) is replaced by — ln0(a;). Hence the sum of all 
these terms is less than d{p, q). On the other hand, if pk > Qk then the factor 
ln(^(a;/rfc) in the fc-th term can be replaced by ln0(l/rfc), which yields the 
bound 

rPk 

/ dx hi^{x/rk) < {pk - Qk) \n^{l/rk). (A5) 
Jqk 

The sum of these terms is bounded by hr{p,q). This finishes the proof of 

In case of the alternative definition of divergence one has 

D^ip\\r) - D^iq\\r) = -^{p) + /^(g) - Y.{Pk ' Qk) W(rfc). (A6) 

k 

Hence, in this case the estimate is straightforward. 



Appendix B 

Here, expression is derived. Note that any increasing concave function 
g{x), satisfying g{0) > 0, also satisfies 

9iXx)giy) < g{x)g{\y) (Bl) 

for all A, x, and for which < A < 1 and < x < ?/ hold. Apply this result 
with g{x) = F^{0) — F^{x) (which is increasing onO<x< 1), A = — 
X = \Vk — <ik\/\\p — <i\\i^ and y = 1. Note that the assumption | |p — g| |i < 1 
is needed here. There follows, using F^{1) = 0, 

[F^{0)-F^{\pk-qk\)]F^{0) 
< [F^(0) - F^lpk - qk\/\\p - g||i)] [F^(0) - F^iWp - • (B2) 

Using (^Hjl this implies 
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Appendix C 

Here, inequality (jK^ is proved. Because is convex one has for any x 

and a > 

F^{x)>F^ia) + ix-a)\n^ia). (CI) 

Therefore ()38|1 imphes 

\I4p)-IM\ < NF40) - NF^ia) - (||j9-g||i -iVa)ln<^(a). (C2) 
The optimal choice of a is a = A^'^||p — g||i. This implies fl52|l . 
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